perm filename MSORT.DON[UP,DOC] blob sn#450840 filedate 1979-06-17 generic text, type T, neo UTF8
MSORT is a string-sort program with minimal features, designed for use
in certain (relatively common) circumstances in which SSORT fails badly.
In most cases you are better off using SSORT.  (See SSORT.REM[UP,DOC]
for further info.)

The salient features of MSORT are:
(1) it always sorts by lines (i.e., uses linefeed as the delimiter)
(2) it always flushes ETV directory or line numbers, if present
(3) it always retains duplicate records (/R switch in SSORT)
(4) it retains null records (SSORT currently flushes them)
(5) it uses the ASCII collating sequence; in particular, upper- and lower-case
    do NOT sort together

The advantage of MSORT over SSORT is that MSORT is moderately efficient on
large, randomly-ordered files.  SSORT uses an internal paging scheme that
fails miserably if it has to access widely scattered lines.  On a typical
test (using a pocket dictionary file of 21112 words, in which each word had
the first two letters removed and was then reversed), SSORT required 34029
disk accesses and took about 90 minutes realtime (32.4 Ebox msecs, 4:10'46
wholine), whereas MSORT took 105 disk accesses and about 1 minute realtime
(16.9 Ebox msecs, 0:30'10 wholine).

MSORT reads the input file in n-line chunks, sorts each chunk in core using
quicksort, and writes the chunks into temporary files that are then merged
by a 15-way merge.  The size of each chunk is based on available core, the
expected maximum length of the input lines (as specified by the user; the
default is 1000 characters).  If the user is able to specify the number of
input lines to within 5%, MSORT will make the chunks smaller so that the
quicksorts (which take O(n log n) time) deal with less data and the burden
is borne instead by the linear-time merge.